COGNIFYZ TECHNOLOGIES: RESTAURANT ANALYSIS¶

                           APPLICANT NAME : JAIMIN YOGESH SHAH
In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")
In [2]:
df=pd.read_csv(r"D:\JAIMIN\Data Analysis\Python\Cognifyz Technologies\Dataset .csv")
In [3]:
df
Out[3]:
Restaurant ID Restaurant Name Country Code City Address Locality Locality Verbose Longitude Latitude Cuisines ... Currency Has Table booking Has Online delivery Is delivering now Switch to order menu Price range Aggregate rating Rating color Rating text Votes
0 6317637 Le Petit Souffle 162 Makati City Third Floor, Century City Mall, Kalayaan Avenu... Century City Mall, Poblacion, Makati City Century City Mall, Poblacion, Makati City, Mak... 121.027535 14.565443 French, Japanese, Desserts ... Botswana Pula(P) Yes No No No 3 4.8 Dark Green Excellent 314
1 6304287 Izakaya Kikufuji 162 Makati City Little Tokyo, 2277 Chino Roces Avenue, Legaspi... Little Tokyo, Legaspi Village, Makati City Little Tokyo, Legaspi Village, Makati City, Ma... 121.014101 14.553708 Japanese ... Botswana Pula(P) Yes No No No 3 4.5 Dark Green Excellent 591
2 6300002 Heat - Edsa Shangri-La 162 Mandaluyong City Edsa Shangri-La, 1 Garden Way, Ortigas, Mandal... Edsa Shangri-La, Ortigas, Mandaluyong City Edsa Shangri-La, Ortigas, Mandaluyong City, Ma... 121.056831 14.581404 Seafood, Asian, Filipino, Indian ... Botswana Pula(P) Yes No No No 4 4.4 Green Very Good 270
3 6318506 Ooma 162 Mandaluyong City Third Floor, Mega Fashion Hall, SM Megamall, O... SM Megamall, Ortigas, Mandaluyong City SM Megamall, Ortigas, Mandaluyong City, Mandal... 121.056475 14.585318 Japanese, Sushi ... Botswana Pula(P) No No No No 4 4.9 Dark Green Excellent 365
4 6314302 Sambo Kojin 162 Mandaluyong City Third Floor, Mega Atrium, SM Megamall, Ortigas... SM Megamall, Ortigas, Mandaluyong City SM Megamall, Ortigas, Mandaluyong City, Mandal... 121.057508 14.584450 Japanese, Korean ... Botswana Pula(P) Yes No No No 4 4.8 Dark Green Excellent 229
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9546 5915730 Naml۱ Gurme 208 ��stanbul Kemanke�� Karamustafa Pa��a Mahallesi, R۱ht۱m ... Karak�_y Karak�_y, ��stanbul 28.977392 41.022793 Turkish ... Turkish Lira(TL) No No No No 3 4.1 Green Very Good 788
9547 5908749 Ceviz A��ac۱ 208 ��stanbul Ko��uyolu Mahallesi, Muhittin ��st�_nda�� Cadd... Ko��uyolu Ko��uyolu, ��stanbul 29.041297 41.009847 World Cuisine, Patisserie, Cafe ... Turkish Lira(TL) No No No No 3 4.2 Green Very Good 1034
9548 5915807 Huqqa 208 ��stanbul Kuru�_e��me Mahallesi, Muallim Naci Caddesi, N... Kuru�_e��me Kuru�_e��me, ��stanbul 29.034640 41.055817 Italian, World Cuisine ... Turkish Lira(TL) No No No No 4 3.7 Yellow Good 661
9549 5916112 A���k Kahve 208 ��stanbul Kuru�_e��me Mahallesi, Muallim Naci Caddesi, N... Kuru�_e��me Kuru�_e��me, ��stanbul 29.036019 41.057979 Restaurant Cafe ... Turkish Lira(TL) No No No No 4 4.0 Green Very Good 901
9550 5927402 Walter's Coffee Roastery 208 ��stanbul Cafea��a Mahallesi, Bademalt۱ Sokak, No 21/B, ... Moda Moda, ��stanbul 29.026016 40.984776 Cafe ... Turkish Lira(TL) No No No No 2 4.0 Green Very Good 591

9551 rows × 21 columns

LEVEL 1 - TASK 1 - TOP CUISINES¶

Question: Determine the top three most common cuisines in the dataset¶

In [4]:
Top_3_Cuisines =df['Cuisines'].value_counts().nlargest(3).index.tolist()
In [5]:
print("The Top 3 Cusinies are: ")
for index, Cuisines in enumerate(Top_3_Cuisines, 1):
    print(f"{index}.{Cuisines}")
The Top 3 Cusinies are: 
1.North Indian
2.North Indian, Chinese
3.Chinese

Question: Calculate the percentage of restaurants that serve each of the top cuisines¶

In [6]:
percentage_cuisine = (df['Cuisines'].value_counts(normalize=True)*100).loc[Top_3_Cuisines]
print("\n Percentage of Restaurants that serve each of the top cuisines:")
for Cuisines, percentage in percentage_cuisine.items():
    print(f" {Cuisines}: {percentage: 2f}%")
 Percentage of Restaurants that serve each of the top cuisines:
 North Indian:  9.809264%
 North Indian, Chinese:  5.355271%
 Chinese:  3.709914%

LEVEL 1 TASK 2 - CITY ANALYSIS¶

Question: Identify the city with the highest number of Restaurants in the dataset¶

In [7]:
Highest_No_Restaurants = df['City'].value_counts().idxmax()
print(f" The City with the Highest Number of Restaurants is: {Highest_No_Restaurants}")
 The City with the Highest Number of Restaurants is: New Delhi

Question: Calculate the average rating for restaurants in each city¶

Question: Determine the city with the highest average rating¶

In [8]:
Average_Rating_Restaurants = df.groupby('City')['Aggregate rating'].mean()
City_Highest_Average_Rating = Average_Rating_Restaurants.idxmax()
Highest_Average_Rating = Average_Rating_Restaurants.max()
print(f" \n The Average Rating for Restaurants in Each City is: {Highest_Average_Rating: .2f}")
print(f" \n The City with the Highest Average Rating is: {City_Highest_Average_Rating}")
 
 The Average Rating for Restaurants in Each City is:  4.90
 
 The City with the Highest Average Rating is: Inner City

LEVEL 1 - TASK 3 - PRICE RANGE DISTRIBUTION¶

Question: Create a Histogram or Bar Chart to visulaize the distribution of price ranges among the restuarants¶

In [9]:
# Bar Chart
colors = ['green', 'orange', 'red', 'blue']
plt.figure(figsize=(10,7))
df['Price range'].value_counts().sort_index().plot(kind="bar", color=colors)
plt.title(" Distribution of price range among the restaurants")
plt.xlabel('Price Range')
plt.ylabel('Number of Restaurants')
plt.xticks(rotation=0)
plt.show()
In [10]:
sns.set_style('dark')
ax = sns.countplot(x = 'Price range', data=df, palette = 'RdBu_r')
for bars in ax.containers: 
    ax.bar_label(bars)

Question: Calculate the percentage of restaurants in each price range category¶

In [11]:
Price_Range_Percentage_Restaurants = (df['Price range']. value_counts(normalize=True)*100).sort_index()
print(" \n Percentage of Restuarants in Each Price Range Category is:0")
for price_range, percentage in Price_Range_Percentage_Restaurants.items():
    print(f" Price range {price_range}: {percentage: .2f}%")
 
 Percentage of Restuarants in Each Price Range Category is:0
 Price range 1:  46.53%
 Price range 2:  32.59%
 Price range 3:  14.74%
 Price range 4:  6.14%

LEVEL 1 - TASK 4 - ONLINE DELIVERY¶

Question: Determine the percentage of restaurants that offer online delivery¶

In [12]:
Online_Delivery_Percentage_Restaurants = (df['Has Online delivery'].value_counts(normalize=True)*100).get('Yes',0)
print(f" Percentage of Restaurants that offer Online delivery: {Online_Delivery_Percentage_Restaurants: .2f}%")
 Percentage of Restaurants that offer Online delivery:  25.66%

Question: Compare the average ratings of restuarants with and without online delivery¶

In [13]:
Average_Rating_With_Online_Delivery = df[df['Has Online delivery'] == 'Yes']['Aggregate rating'].mean()
Average_Rating_Without_Online_Delivery = df[df['Has Online delivery'] == 'No']['Aggregate rating'].mean()
print(f" \n Average Rating of Restaurants with Online Delivery: {Average_Rating_With_Online_Delivery:.2f}")
print(f" \n Average Rating of Restaurants without Online Delivery: {Average_Rating_Without_Online_Delivery: .2f}")
 
 Average Rating of Restaurants with Online Delivery: 3.25
 
 Average Rating of Restaurants without Online Delivery:  2.47

LEVEL 2 - TASK 1 - RESTAURANT RATINGS¶

Question: Analyze the distribution of aggregate ratings and determine the most common rating range¶

In [14]:
Rating_Counts = df['Aggregate rating'].value_counts().sort_index(ascending=False)
Most_Common_Rating_Range = Rating_Counts.idxmax()
print(f"Most common Rating Range is: {Most_Common_Rating_Range}")
Most common Rating Range is: 0.0

Question: Calculate the average number of votes received by restaurants¶

In [15]:
Average_No_Of_Votes = df['Votes'].mean()
print(f" \n Average Number of Votes Received by Restaurants: {Average_No_Of_Votes: .2f}")
 
 Average Number of Votes Received by Restaurants:  156.91

LEVEL 2 - TASK 2 - CUISINE COMBINATION¶

Question: Identify the most common combinations of cuisines in the dataset¶

Question: Determine if certain cuisine combinations tend to have higher ratings¶

In [16]:
# Unique Combinations for cuisines
unique_combinations = df['Cuisines'].str.split(',').explode().unique()
In [17]:
# Combination Counts
combination_counts = pd.DataFrame(columns=['Combination','Count', 'Average Rating'])
In [18]:
# Calculate Counts and Average Ratings
for combination in unique_combinations:
    combination_filter = df['Cuisines'].apply(lambda x: str(x).lower().find(str(combination).lower()) != -1 if pd.notna(x) else False)
    combination_count = combination_filter.sum()
    average_rating = df.loc[combination_filter, 'Aggregate rating'].mean()
    new_row = pd.DataFrame([{'Combination': combination, 'Count': combination_count, 'Average Rating': average_rating}])
    combination_counts = pd.concat([combination_counts, new_row], ignore_index=True)
    
# Sorting DataFrame
combination_counts = combination_counts.sort_values(by='Count', ascending=False)

print("Most common Cuisine Combinations:")
print(combination_counts[['Combination', 'Count']])
print("\n Average Ratings for Cuisine Combinations:")
print(combination_counts[['Combination', 'Average Rating']])
Most common Cuisine Combinations:
        Combination Count
54           Indian  4259
7            Indian  4220
144    North Indian  3960
10          Chinese  2733
27        Fast Food  1987
..              ...   ...
218       Peranakan     1
203  Cuisine Varies     1
202         Malwani     1
249   World Cuisine     1
63              NaN     0

[250 rows x 2 columns]

 Average Ratings for Cuisine Combinations:
        Combination  Average Rating
54           Indian        2.525053
7            Indian        2.510972
144    North Indian        2.510455
10          Chinese        2.620234
27        Fast Food        2.563966
..              ...             ...
218       Peranakan        4.000000
203  Cuisine Varies        0.000000
202         Malwani        3.500000
249   World Cuisine        3.700000
63              NaN             NaN

[250 rows x 2 columns]

LEVEL 2 - TASK 3 - GEOGRAPHIC ANALYSIS¶

Question: Plot the locations of restaurants on a map using longitude and latitude coordinates¶

Question: Identify any patterns of clusters of restaurants in specific areas¶

In [19]:
pip install folium
Requirement already satisfied: folium in d:\anaconda\anaconda3\lib\site-packages (0.17.0)
Requirement already satisfied: branca>=0.6.0 in d:\anaconda\anaconda3\lib\site-packages (from folium) (0.7.2)
Requirement already satisfied: jinja2>=2.9 in d:\anaconda\anaconda3\lib\site-packages (from folium) (3.1.2)
Requirement already satisfied: numpy in d:\anaconda\anaconda3\lib\site-packages (from folium) (1.24.3)
Requirement already satisfied: requests in d:\anaconda\anaconda3\lib\site-packages (from folium) (2.31.0)
Requirement already satisfied: xyzservices in d:\anaconda\anaconda3\lib\site-packages (from folium) (2022.9.0)
Requirement already satisfied: MarkupSafe>=2.0 in d:\anaconda\anaconda3\lib\site-packages (from jinja2>=2.9->folium) (2.1.1)
Requirement already satisfied: charset-normalizer<4,>=2 in d:\anaconda\anaconda3\lib\site-packages (from requests->folium) (2.0.4)
Requirement already satisfied: idna<4,>=2.5 in d:\anaconda\anaconda3\lib\site-packages (from requests->folium) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in d:\anaconda\anaconda3\lib\site-packages (from requests->folium) (1.26.16)
Requirement already satisfied: certifi>=2017.4.17 in d:\anaconda\anaconda3\lib\site-packages (from requests->folium) (2023.7.22)
Note: you may need to restart the kernel to use updated packages.
In [20]:
import folium
from folium.plugins import MarkerCluster
In [21]:
import folium
map_center = folium.Map(location=[df['Latitude'].mean(), df['Longitude'].mean()], zoom_start=10)

for index, row in df.iterrows():
    latitude = row['Latitude']
    longitude = row['Longitude']
    location_name = row['Restaurant Name']
    marker = folium.Marker([latitude, longitude], tooltip=location_name)
    marker.add_to(map_center)

# Save the map as an HTML file or display it
map_center.save("map.html")
#map_center
In [22]:
import plotly.express as px
fig = px.scatter(df, x='Latitude', y='Longitude', text='Restaurant Name',
                 title='Scatter Plot of Latitude vs. Longitude with Restaurant Names')

fig.update_traces(marker=dict(size=8, opacity=0.7, line=dict(width=0)))

fig.update_traces(textposition='top center')
fig.show()
In [23]:
from scipy.cluster.hierarchy import linkage
from sklearn.preprocessing import StandardScaler
X = df[['Latitude', 'Longitude']]

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

#hierarchical clustering
linked = linkage(X_scaled, method='ward')
plt.figure(figsize=(12, 8))
sns.clustermap(X_scaled, method='ward', cmap='viridis', figsize=(10, 8))

plt.title('Agglomerative Hierarchical Clustering')
plt.show()
<Figure size 1200x800 with 0 Axes>
In [24]:
import folium
from folium.plugins import MarkerCluster
import pandas as pd
In [25]:
# Filter out rows with missing latitude or longitude values
df= df.dropna(subset=['Latitude', 'Longitude'])
df = pd.DataFrame(df)
In [26]:
# Create a map
average_latitude = df['Latitude'].mean()
average_longitude = df['Longitude'].mean()
restaurant_map = folium.Map(location = [average_latitude, average_longitude], zoom_start = 12)
restaurant_map = folium.Map(location=[22.3072, 73.1812], zoom_start=13)
In [27]:
# Create MarkerCluster
marker_cluster = MarkerCluster().add_to(restaurant_map)
In [28]:
# Add Markers for each restaurant
for index, row in df.iterrows():
    folium.Marker(location=[row['Latitude'], row['Longitude']], popup=row['Restaurant Name']).add_to(marker_cluster)
In [29]:
# Save the map as HTML
restaurant_map.save('restaurant_map.html')
restaurant_map
Out[29]:
Make this Notebook Trusted to load map: File -> Trust Notebook

LEVEL 2 - TASK 4 - RESTAURANT CHAINS¶

Question: Identify if there are any restaurant chains present in the dataset¶

Question: Analyze the ratings and popularity of different restaurant chains¶

In [30]:
restaurant_chains = df['Restaurant Name']. value_counts()
In [31]:
chains = restaurant_chains[restaurant_chains > 1].index
In [32]:
chain_df = df[df['Restaurant Name'].isin(chains)].copy()
In [33]:
chain_analysis = chain_df.groupby('Restaurant Name').agg({'Aggregate rating': 'mean', 'Votes': "sum", 'Average Cost for two': 'mean'}).sort_values(by='Votes',ascending=False)
print("Restaurant Chains Analysis:")
print(chain_analysis)
Restaurant Chains Analysis:
                           Aggregate rating  Votes  Average Cost for two
Restaurant Name                                                         
Barbeque Nation                    4.353846  28142           1498.076923
AB's - Absolute Barbecues          4.825000  13400           1500.000000
Big Chill                          4.475000  10853           1500.000000
Farzi Cafe                         4.366667  10098           1516.666667
Truffles                           3.950000   9682            550.000000
...                                     ...    ...                   ...
Bikaner Misthan Bhandar            0.000000      0            250.000000
Aap Ki Khatir                      0.000000      0            400.000000
Street Cafe                        0.000000      0            300.000000
Jyoti Sweets                       0.000000      0            200.000000
Firangi Bake                       0.000000      0            300.000000

[734 rows x 3 columns]
In [34]:
chain = df.groupby('Restaurant Name').agg({'Aggregate rating':'mean', 'Votes':'sum'})
chain = chain.sort_values(by='Aggregate rating', ascending=False)
top_chain = chain.head(10)

print('Top 10 Restaurant Chains:')
print(top_chain)

custom_palette = sns.color_palette("Blues_d", n_colors=len(top_chain))

plt.figure(figsize=(12, 6))

plt.subplot(1, 2, 1)
sns.barplot(x=top_chain['Aggregate rating'], y=top_chain.index, palette=custom_palette)
plt.xlabel('Average Rating')
plt.title('Average Rating of Top Chains')

plt.subplot(1, 2, 2)
sns.barplot(x=top_chain['Votes'], y=top_chain.index, palette=custom_palette)
plt.xlabel('Total Votes (Popularity)')
plt.title('Popularity of Top Chains')

plt.tight_layout()
plt.show()
Top 10 Restaurant Chains:
                                Aggregate rating  Votes
Restaurant Name                                        
Restaurant Mosaic @ The Orient               4.9     85
Ministry of Crab                             4.9    203
Miann                                        4.9    281
Shorts Burger and Shine                      4.9    820
Milse                                        4.9    754
Yellow Dog Eats                              4.9   1252
Duck & Waffle                                4.9    706
Gaga Manjero                                 4.9     95
Mirchi And Mime                              4.9   3244
McGuire's Irish Pub & Brewery                4.9   2238

LEVEL 3 - TASK 1 - RESTAURANT REVIEWS¶

Question: Analyze the text reviews to identify the most common positive and negative keywords¶

In [35]:
text_reviews = df['Rating text'].value_counts()
print("Rating text are as follows: ")
print(text_reviews)

sentiment_mapping = {
    'Good': 'Positive',
    'Very Good': 'Positive',
    'Excellent': 'Positive',
    'Poor': 'Negative',
    'Not rated': 'Neutral',
    'Average': 'Neutral',
}

custom_palette = sns.color_palette("husl", len(text_reviews))
df['Sentiment'] = df['Rating text'].map(sentiment_mapping)
plt.figure(figsize=(10, 6))
bars = plt.bar(text_reviews.index, text_reviews.values, color=custom_palette)

plt.xlabel('Rating Text')
plt.ylabel('Count')
plt.title('Distribution of Rating Text')
plt.xticks(rotation=45)
plt.grid(axis='y', linestyle='--', alpha=0.6)
plt.ylim(0, text_reviews.max() + 100)

for bar, count in zip(bars, text_reviews.values):
    plt.text(bar.get_x() + bar.get_width() / 2, count + 5, str(count), ha='center', va='bottom')

plt.show()

df['Sentiment']
Rating text are as follows: 
Rating text
Average      3737
Not rated    2148
Good         2100
Very Good    1079
Excellent     301
Poor          186
Name: count, dtype: int64
Out[35]:
0       Positive
1       Positive
2       Positive
3       Positive
4       Positive
          ...   
9546    Positive
9547    Positive
9548    Positive
9549    Positive
9550    Positive
Name: Sentiment, Length: 9551, dtype: object

Question: Calculate the average length of reviews and explore if there is a relationship between review length and rating¶

In [36]:
df['Review Length'] = df['Rating text'].str.len()
df['Review Length']
average_review_length = df.groupby('Sentiment')['Review Length'].mean()

average_review_length.plot(kind='bar')
plt.xlabel('Sentiment')
plt.ylabel('Average Review Length')
plt.title('Relationship Between Review Length and Sentiment')
plt.xticks(rotation=45)
plt.show()

print("Average Review Length by Sentiment:")
print(average_review_length)
Average Review Length by Sentiment:
Sentiment
Negative    4.000000
Neutral     7.729992
Positive    5.982759
Name: Review Length, dtype: float64
In [37]:
from scipy.stats import pearsonr
sentiment_mapping = {
    'Positive': 1,
    'Neutral': 0,
    'Negative': -1,
}
df['Sentiment_numeric'] = df['Sentiment'].map(sentiment_mapping)

correlation_coefficient, _ = pearsonr(df['Review Length'], df['Sentiment_numeric'])

print(f"Correlation Coefficient: {correlation_coefficient}")
Correlation Coefficient: -0.3256368670501089

LEVEL 3 - TASK 2 - VOTES ANALYSIS¶

Question: Identify the Restaurants with highest and lowest number of votes¶

In [38]:
restaurant_with_highest_votes = df.nlargest(1, 'Votes')
restaurant_with_lowest_votes = df.nsmallest(1, 'Votes')

result_df = pd.DataFrame({
    'Restaurant with High Votes': restaurant_with_highest_votes.values[0],
    'Restaurant with Low Votes': restaurant_with_lowest_votes.values[0]
}, index=df.columns)
result_df
Out[38]:
Restaurant with High Votes Restaurant with Low Votes
Restaurant ID 51705 6710645
Restaurant Name Toit Cantinho da Gula
Country Code 1 30
City Bangalore S��o Paulo
Address 298, Namma Metro Pillar 62, 100 Feet Road, Ind... Rua Pedroso Alvarenga, 522, Itaim Bibi, S��o P...
Locality Indiranagar Itaim Bibi
Locality Verbose Indiranagar, Bangalore Itaim Bibi, S��o Paulo
Longitude 77.640709 -46.675667
Latitude 12.979166 -23.581
Cuisines Italian, American, Pizza Brazilian
Average Cost for two 2000 55
Currency Indian Rupees(Rs.) Brazilian Real(R$)
Has Table booking No No
Has Online delivery No No
Is delivering now No No
Switch to order menu No No
Price range 4 2
Aggregate rating 4.8 0.0
Rating color Dark Green White
Rating text Excellent Not rated
Votes 10934 0
Sentiment Positive Neutral
Review Length 9 9
Sentiment_numeric 1 0
In [39]:
highest_votes_restaurant = df.loc[df['Votes'].idxmax()]
lowest_votes_restaurant = df.loc[df['Votes'].idxmin()]
print("Restaurant with the highest number of Votes:")
print(highest_votes_restaurant[["Restaurant Name", "Votes", "Aggregate rating"]])
print(" \n Restaurant with the lowest number of Votes:")
print(lowest_votes_restaurant[["Restaurant Name", "Votes", "Aggregate rating"]])
Restaurant with the highest number of Votes:
Restaurant Name      Toit
Votes               10934
Aggregate rating      4.8
Name: 728, dtype: object
 
 Restaurant with the lowest number of Votes:
Restaurant Name     Cantinho da Gula
Votes                              0
Aggregate rating                 0.0
Name: 69, dtype: object

Question: Analyze if there is a correlation between the number of votes and the rating of a restuarant¶

In [40]:
plt.figure(figsize=(10,6))
sns.scatterplot(x='Votes', y = 'Aggregate rating', data=df)
plt.title("Correlation between Number of Votes and Restaurant Rating")
plt.xlabel("Number of Votes")
plt.ylabel("Aggregate rating")
plt.show()

Correlation_coefficient= df['Votes'].corr(df['Aggregate rating'])
print(f" \n Correlation Coefficent between Votes and Rating: {correlation_coefficient}")
 
 Correlation Coefficent between Votes and Rating: -0.3256368670501089
In [41]:
votes_rating_analysis = df['Aggregate rating'].corr(df['Votes'])
print("Correlation between Aggregate rating and Votes:", votes_rating_analysis)
Correlation between Aggregate rating and Votes: 0.3136905841954117
In [42]:
corr_matrix = df.select_dtypes(["int64", "float64"]).corr()
plt.figure(figsize=(12, 8))
sns.heatmap(corr_matrix, annot=True, cmap='YlGnBu', fmt=".2f")
plt.title("Correlation Heatmap for Numerical Columns")
plt.show()

LEVEL 3 - TASK 3 - PRICE RANGE VS. ONLINE DELIVERY AND TABLE BOOKING¶

Question: Analyze if there is a relationship between the price range and the availability of online delivery and table booking¶

In [43]:
grouped_data = df.groupby(['Price range', 'Has Online delivery']).size().unstack()

colors = ['green', 'red']
ax = grouped_data.plot(kind='bar', color=colors, width=0.8)
plt.xlabel('Price Range')
plt.ylabel('Count')
plt.title('Distribution of Online Delivery by Price Range')
plt.xticks(rotation=0)
plt.legend(title='Has Online delivery')
plt.show()
In [44]:
grouped_data = df.groupby(['Price range', 'Has Table booking']).size().unstack()

colors = ['green', 'red']
ax = grouped_data.plot(kind='bar', color=colors, width=0.8)
plt.xlabel('Price Range')
plt.ylabel('Count')
plt.title('Distribution of Table booking by Price Range')
plt.xticks(rotation=0)
plt.legend(title='Has Table booking')
plt.show()

Question: Determine if higher-priced restaurants are more likely to offer these services¶

In [45]:
grouped_data = df.groupby('Price range').agg(
    Online_delivery_percentage=('Has Online delivery', lambda x: (x == 'Yes').mean() * 100),
    Table_booking_percentage=('Has Table booking', lambda x: (x == 'Yes').mean() * 100),
)

# Plot
ax = grouped_data.plot(kind='bar', figsize=(8, 6))
plt.xlabel('Price Range')
plt.ylabel('Percentage')
plt.title('Percentage of Restaurants Offering Services by Price Range')
plt.xticks(rotation=0)
plt.legend(title='Service')
plt.show()
In [46]:
from scipy.stats import chi2_contingency
#Online Delivery
price_online_contingency = pd.crosstab(df['Price range'], df['Has Online delivery'])

chi2, p, _, _ = chi2_contingency(price_online_contingency)
alpha = 0.05
if p < alpha:
    print("There is a statistically significant association between Price range and Online delivery.")
else:
    print("There is no statistically significant association between Price range and Online delivery.")

#Table booking
price_table_booking_contingency = pd.crosstab(df['Price range'], df['Has Table booking'])
chi2, p, _, _ = chi2_contingency(price_table_booking_contingency)

if p < alpha:
    print("There is a statistically significant association between Price range and Table booking.")
else:
    print("There is no statistically significant association between Price range and Table booking.")
There is a statistically significant association between Price range and Online delivery.
There is a statistically significant association between Price range and Table booking.
In [ ]: